首页> 外文OA文献 >How data volume affects spark based data analytics on a scale-up server

【2h】

How data volume affects spark based data analytics on a scale-up server

机译：数据量如何影响向上扩展服务器上基于spark的数据分析

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark is gaining popularity for exhibiting superior scale-out performance on the commodity machines, the impact of data volume on the performance of Spark based data analytics in scale-up configuration is not well understood. We present a deep-dive analysis of Spark based applications on a large scale-up server machine. Our analysis reveals that Spark based data analytics are DRAM bound and do not benefit by using more than 12 cores for an executor. By enlarging input data size, application performance degrades significantly due to substantial increase in wait time during I/O operations and garbage collection, despite 10 % better instruction retirement rate (due to lower L1 cache misses and higher core utilization). We match memory behaviour with the garbage collector to improve performance of applications between 1.6x to 3x.

机译：在过去的十年中，数据量的巨大增长引发了对集群计算框架的研究，使网络企业能够从大数据中提取重要见解。尽管Apache Spark通过在商用机器上展现出卓越的横向扩展性能而受到欢迎，但对于纵向扩展配置中数据量对基于Spark的数据分析性能的影响还知之甚少。我们对大型服务器计算机上基于Spark的应用程序进行了深入分析。我们的分析表明，基于Spark的数据分析受DRAM约束，并且对于执行程序使用12个以上的内核不会受益。通过扩大输入数据大小，尽管I / O操作和垃圾回收期间的等待时间显着增加，但应用程序性能却大大降低，尽管指令报废率提高了10％（由于较低的L1缓存未命中率和较高的核心利用率）。我们将内存行为与垃圾回收器进行匹配，以将应用程序的性能提高1.6倍至3倍。

著录项

作者
Awan, Ahsan; Brorsson, Mats; Vlassov, Vladimir; Ayguadé Parra, Eduard;
展开▼
作者单位

展开▼
年度 2024
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Spark-based data analytics of sequence motifs in large omics data [J] . Oluwafemi A. Sarumi, Carson K. Leung, Adebayo O. Adetunmbi Procedia Computer Science . 2018,第1期

机译：大型组学数据中基于序列数据的基于Spark的数据分析
2. Big data analytics with Spark: a practitioner's guide to using Spark for large scale data analysis [J] . Andre Maximo Computing reviews . 2016,第8期

机译：使用Spark进行大数据分析：使用Spark进行大规模数据分析的从业指南
3. Comparing datasets of volume servers to illuminate their energy use in data centers [J] . Heidi Fuchs, Arman Shehabi, Mohan Ganeshalingam, Energy efficiency . 2020,第3期

机译：比较卷服务器数据集以照亮数据中心的能源使用
4. How Data Volume Affects Spark Based Data Analytics on a Scale-up Server [C] . Ahsan Javed Awan, Mats Brorsson, Vladimir Vlassov, International conference on social informatics . 2016

机译：数据量如何影响放大服务器上基于Spark的数据分析
5. Server-based data push architecture for data access performance optimization. [D] . Byna, Surendra. 2006

机译：基于服务器的数据推送体系结构，可优化数据访问性能。
6. Big Data Analytics with Datalog Queries on Spark [O] . Alexander Shkapsky, Mohan Yang, Matteo Interlandi, -1

机译：在Spark上通过Datalog查询进行大数据分析
7. How Data Volume Affects Spark Based Data Analytics on a Scale-up Server [O] . Awan, Ahsan Javed, Brorsson, Mats, Vlassov, Vladimir, 2015

机译：数据量如何影响放大服务器上基于Spark的数据分析

How data volume affects spark based data analytics on a scale-up server

摘要

著录项

相似文献

相关主题

期刊订阅